1,089 research outputs found
Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text
Recently, Transformer-based text detection techniques have sought to predict
polygons by encoding the coordinates of individual boundary vertices using
distinct query features. However, this approach incurs a significant memory
overhead and struggles to effectively capture the intricate relationships
between vertices belonging to the same instance. Consequently, irregular text
layouts often lead to the prediction of outlined vertices, diminishing the
quality of results. To address these challenges, we present an innovative
approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon
prediction. Our method ensures precision by iteratively refining polygon
predictions, considering both the scale and location of preceding results.
Leveraging this stabilized regression pipeline, even employing just a single
feature vector to guide polygon instance regression yields promising detection
results. Simultaneously, the leverage of instance-level feature proposal
substantially enhances memory efficiency (>50% less vs. the state-of-the-art
method DPText-DETR) and reduces inference speed (>40% less vs. DPText-DETR)
with minor performance drop on benchmarks
Fine-grained Audible Video Description
We explore a new task for audio-visual-language modeling called fine-grained
audible video description (FAVD). It aims to provide detailed textual
descriptions for the given audible videos, including the appearance and spatial
locations of each object, the actions of moving objects, and the sounds in
videos. Existing visual-language modeling tasks often concentrate on visual
cues in videos while undervaluing the language and audio modalities. On the
other hand, FAVD requires not only audio-visual-language modeling skills but
also paragraph-level language generation abilities. We construct the first
fine-grained audible video description benchmark (FAVDBench) to facilitate this
research. For each video clip, we first provide a one-sentence summary of the
video, ie, the caption, followed by 4-6 sentences describing the visual details
and 1-2 audio-related descriptions at the end. The descriptions are provided in
both English and Chinese. We create two new metrics for this task: an
EntityScore to gauge the completeness of entities in the visual descriptions,
and an AudioScore to assess the audio descriptions. As a preliminary approach
to this task, we propose an audio-visual-language transformer that extends
existing video captioning model with an additional audio branch. We combine the
masked language modeling and auto-regressive language modeling losses to
optimize our model so that it can produce paragraph-level descriptions. We
illustrate the efficiency of our model in audio-visual-language modeling by
evaluating it against the proposed benchmark using both conventional captioning
metrics and our proposed metrics. We further put our benchmark to the test in
video generation models, demonstrating that employing fine-grained video
descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou
contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link:
www.avlbench.opennlplab.c
Oxidized sulfur-rich arc magmas formed porphyry Cu deposits by 1.88 Ga
Most known porphyry Cu deposits formed in the Phanerozoic and are exclusively associated with moderately oxidized, sulfur-rich, hydrous arc-related magmas derived from partial melting of the asthenospheric mantle metasomatized by slab-derived fluids. Yet, whether similar metallogenic processes also operated in the Precambrian remains obscure. Here we address the issue by investigating the origin, fO2, and S contents of calc-alkaline plutonic rocks associated with the Haib porphyry Cu deposit in the Paleoproterozoic Richtersveld Magmatic Arc (southern Namibia), an interpreted mature island-arc setting. We show that the ca. 1886–1881 Ma ore-forming magmas, originated from a mantle-dominated source with minor crustal contributions, were relatively oxidized (1‒2 log units above the fayalite-magnetite-quartz redox buffer) and sulfur-rich. These results indicate that moderately oxidized, sulfur-rich arc magma associated with porphyry Cu mineralization already existed in the late Paleoproterozoic, probably as a result of recycling of sulfate-rich seawater or sediments from the subducted oceanic lithosphere at that time
- …